Metadata Extraction using Text Mining

نویسندگان

  • Shivani Seth
  • Stefan Rüping
  • Stefan Wrobel
چکیده

Grid technologies have proven to be very successful in the area of eScience, and healthcare in particular, because they allow to easily combine proven solutions for data querying, integration, and analysis into a secure, scalable framework. In order to integrate the services that implement these solutions into a given Grid architecture, some metadata is required, for example information about the low-level access to these services, security information, and some documentation for the user. In this paper, we investigate how relevant metadata can be extracted from a semi-structured textual documentation of the algorithm that is underlying the service, by the use of text mining methods. In particular, we investigate the semi-automatic conversion of functions of the statistical environment R into Grid services as implemented by the GridR tool by the generation of appropriate metadata.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Mining

“Bag of words” model, acronym extraction, authorship ascription, coordinate matching, data mining, document clustering, document frequency, document retrieval, document similarity metrics, entity extraction, hidden Markov models, hubs and authorities, information extraction, information retrieval, key-phrase assignment, key-phrase extraction, knowledge engineering, language identification, link...

متن کامل

A Temporal Text Mining Application in Competitive Intelligence

In this paper we describe an application of our approach to temporal text mining in Competitive Intelligence for the biotechnology and pharmaceutical industry. The main objective is to identify changes and trends of associations among entities of interest that appear in text over time. Text Mining (TM) exploits information contained in textual data in various ways, including the type of analyse...

متن کامل

A Document Engineering Approach to Automatic Extraction of Shallow Metadata from Scientific Publications

Semantic metadata can be considered one of the foundational blocks of the Semantic Web and Desktop. This report describes a solution for automatic metadata extraction from scientific publications, published as PDF documents. The proposed algorithms follow a low-level document engineering approach, by combining mining and analysis of the publications’ text based on its formatting style and font ...

متن کامل

A Rough-Set-Refined Text Mining Approach for Crude Oil Market Tendency Forecasting

In this study, we propose a knowledge-based forecasting system — rough-set-refined text mining (RSTM) approach — for crude oil price tendency forecasting. This system consists of two modules. In the first module, text mining techniques are used to construct a metadata repository and generate rough knowledge by extracting unstructured text documents, including gathering various related text docu...

متن کامل

Metadata extraction and text categorization using Universal Resource Locator expansions

Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of URLs to yield categoric metadata about web resources via a three-phase pipeline of word segmentation, abbreviation expansion and classification. I apply this approach to the problem of subject metadat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Studies in health technology and informatics

دوره 147  شماره 

صفحات  -

تاریخ انتشار 2009